>>The difference between text files on Mac, Unix and Dos are all in the
>>end-of-line characters.
>
>Another minor difference is end-of-file handling. MS-DOS text files end
>at the first control-Z character, regardless of any other data in the
>file after the cntl-Z. Macintosh files just use the file length
>recorded in the disk directory.
Actually, this is no longer true. No DOS or Windows program worth
its salt depends on finding a ctrl-z character to mark EOF. Of
course, the number of DOS/Windows worth their salt is an open
question in this group (:-).
In fact, the only program I can think of that uses ctrl-z to mark
the end of file is COMMAND.COM, whose COPY command will stop upon
finding a ctrl-z in the input (unless you specify /b). Just one
more charming DOS idiotsyncracy.
I think the original reason for using ctrl-z was compatibility with
CP/M, which only kept track of the number of 128-byte blocks allocated
to a file, but not the actual file length.
+++++++++++++++++++++++++++
>From jwbaxter@olympus.net (John W. Baxter)
Date: Wed, 22 Nov 1995 18:52:04 -0800
Organization: Townsend Communications
In article <scipioni-2011951231090001@scipioni.nai.net>, scipioni@nai.net
(Steve Scipioni) wrote:
> What is the difference between UNIX, DOS, and Mac text format? I thought
> that ASCII was ASCII.
ASCII is ASCII, and only defines the character values from 0x00 throu
0x7F. End-of-line is a separate issue...all ASCII defines is the
characters, including CR as 0x0D, LF as 0x0A, record, field, group, and
??? separators (the RS, GS, FS, ?S) characters in the general area of 0x1C
or so (left over from punched paper tape, where a "line" was one row
across the tape, ie one character usually).
Line ends aren't part of ASCII. Neither are the characters from 0x80
through 0xFF.
--John
--
"This item is not available because it cannot be removed."
John W. Baxter Port Ludlow, WA jwbaxter@olympus.net
+++++++++++++++++++++++++++
>From d_spacey@icrf.icnet.uk (Dylan the Hippy Wabbit)
Date: 27 Nov 1995 12:51:53 GMT
Organization: Imperial Cancer Research Fund
In article <scipioni-2011951231090001@scipioni.nai.net>, scipioni@nai.net
(Steve Scipioni) wrote:
> > What is the difference between UNIX, DOS, and Mac text format? I thought
> > that ASCII was ASCII.
Macs reading PC clone text puts a black box at every LF character,
Windows with Mac text tries to put it all on one line. Why doesn't PC
Exchange deal with translation of a text file automatically?
Of course Mac, Dos/Windows, and Un*x ( who owns that this week?) aren't
all the systems in the world. I gather many IBM mainframes use EDBIC,
which bears no resemblance to ASCII whatsoever.
Dave Spacey
--
Don't underestimate the abacus......it requires no power, can be made with any materials you have to hand, and never goes bing in the middle of an important piece of work. (Many thanks to Douglas Adams.)
+++++++++++++++++++++++++++
>From rac@intrigue.com (Robert Coie)
Date: Wed, 29 Nov 1995 13:02:47 -0800
Organization: Intrigue Corporation
In article <48sjna$3fg@nef.ens.fr>, pottier@trimaran.ens.fr (Francois
Pottier) wrote:
: Apple also screwed it up by making its Japanese two-byte character set
: incompatible with its usual 8-bit character set, i.e. you can't have
: French accents on a Mac Japanese system.
Agreed, you run into trouble with Japanese fonts, but the Japanese system
has all the Roman fonts of the U.S. system. Does the French system have
extra fonts missing from U.S. English? In any case, French accents have
worked perfectly well for me on every KanjiTalk version I have used (back
to 6.0) as long as I make sure to use Roman fonts.
I think you're being too hard on Apple: I support the choice of Shift-JIS
for Japanese encoding because it allows context-free mixing of 7-bit ASCII
and Japanese and was largely compatible with the NEC 9800 series, which
had a huge share of the personal computer market in Japan when KanjiTalk
1.0 came out. Using the >$80 characters for accents &c was also IMHO the
right decision when it was made.
The problem is that mixing text that uses the >$80 range in radically
different ways is not tenable unless extra information is stored, such as
a script code (as has been possible since Styled TextEdit and the Script
Manager made their debut). Until all characters are at least two bytes
wide (as in Unicode), such conflicts are unavoidable, and the Mac OS does
the best job of minimizing the impact of these difficulties of any system
I have worked with.
Before DOS/V (2 or 3 years ago), you couldn't run English DOS on most
Japanese PCs, and lots of English software just wouldn't run. With DOS/V
you have to restart to switch languages. English Windows 3.1 won't run on
top of Japanese DOS and vice versa, and many English Windows apps won't
run under Japanese windows, so again you have to restart frequently to
switch languages. Some Japanese characters are forbidden in DOS
filenames. I don't know how Windows 95 handles these issues (the Japanese
version just shipped last week). With Japanese X-Windows, the burden for
setting scripts lies largely on the user, and Unix Japanese character
encoding varies from vendor to vendor. Moving documents from DEC Ultrix
to a Sun requires reencoding all Japanese text, for example.
On the MacOS, one can switch from writing English to Japanese with one
keystroke, mix Japanese, English, and French in the same document and
never have to restart. Although there are some exceptions and annoyances,
virtually all software written to run under the U.S. Mac OS at least works
under KanjiTalk and most programs even allow entry of Japanese even if the
programmer made no effort at all to support such.
Robert Coie
Implementor, Intrigue Corporation
rac@intrigue.com
+++++++++++++++++++++++++++
>From pottier@trimaran.ens.fr (Francois Pottier)
Date: 30 Nov 1995 01:26:19 GMT
Organization: Ecole Normale Superieure, Paris
In article <rac-2911951302470001@intrigue.intrigue.com>,
Robert Coie <rac@intrigue.com> wrote:
>I think you're being too hard on Apple:
Um, you're right, this solution was the best one since it allowed mixing
7 bit ASCII characters with Japanese. And French accents can be obtained
by explicitly selecting a Roman font. The only remaining problem is that
you can't have French accents in file names, because the Japanese Finder
uses a Japanese font for file names.
Thanks for setting things straight,
--
Francois
pottier@dmi.ens.fr
http://www.eleves.ens.fr:8080/home/pottier/
+++++++++++++++++++++++++++
>From bill@scconsult.com (Bill Stewart-Cole)
Date: Tue, 21 Nov 1995 23:17:43 -0600
Organization: ZOG
In article <scipioni-2011951231090001@scipioni.nai.net>, scipioni@nai.net
(Steve Scipioni) wrote:
>What is the difference between UNIX, DOS, and Mac text format? I thought
>that ASCII was ASCII.
ASCII is ASCII is a 7-bit encoding. Past 127 in an 8-bit byte, what you
get can vary, a lot. But that's not really what you are talking about.
>I'm writing this is BBEdit, and will save it as Mac
>format, which is how I have the program set up as default. I'll then open
>it in Newswatcher, and send it off to this newsgroup. I also use BBEdit to
>write HTML documents, and my HTML seems to load just fine once it's on the
>server (a UNIX system).
>
>I have come to understand that the primary difference between the way the
>three OS's handle text is by way of "end of line conventions." Just how
>does a line feed differ from a carriage return? And is it true that the
>Mac uses both? How is text rendered on other systems when it encounters a
>file that was created on a different system that uses a convention
>different from its own?
This is actually the real area of variation.
The Mac uses a carriage return (ASCII 13) to end lines. Actually, the more
common reality is that NOTHING ends lines on a Mac, because your text
usually wraps to a window. CR is more commonly a paragraph ending. This is
why with some plain text editors you get huge lines when opening a text
file made with SimpleText or many word processors.
DOS uses BOTH a carriage return then a line feed (ASCII 10) to break
lines. With Windows, this often really means paragraph breaks.
Unix systems actually vary a little. I believe some use LF+CR, some CR+LF,
some just CR. The most common is ( I THINK) LF+CR
The reason Unices (plural of Unix?) can vary is that there is a defined
standard for the "network virtual terminal" emulation used to do things
like telnet, and it is CR+LF. In addition, every other text-based Unix
session is done thru the tty devices, that get associated with whatever
other termional emulation is used. Hence, no matter what a file looks like
on disk, when you show it on your screen thru a terminal session, you get
whatever the terminal eulation says is the right line break.
The origin of this is the venerable teletype. On a teletype, a carriage
return is a distinct function from a line feed. Remember manual
typewriters? (No? am I getting OLD?) The movement of the carriage back to
the start is not necessarily accompanied by a roll of the platen, and vice
versa. Teletypes are not, like video terminals, dependent on lines that
are filled to the left. Using CR and LF independently, things like
strikethru and doublestrike text and data-efficient
transmissions of things
like
this
were possible (without lots
of spaces sent at 75 baud) Hence the line break became 2 characters in
terminal-oriented operating systems.
--
Bill Stewart-Cole
What is Stewart-Cole Consulting?
Hell if I know. I'll find out when I finish the web page.
Current projected date: 12/1. I'm not saying what year